MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes Supplementary Material
نویسندگان
چکیده
MATAM default SSU-RNA reference database is built using the Silva 128 SSU Ref NR99 database [10], comprised of 645 151 procaryotic 16S rRNA sequences (https://www.arb-silva.de/fileadmin/silva_ databases/release_128/Exports/SILVA_128_SSURef_Nr99_tax_silva_trunc.fasta.gz). Sequences with consecutive chunks of unknown nucleotides (N) larger that 5 nucleotides are filtered out and remaining Ns are replaced with A nucleotides, yielding 642 903 sequences. The filtered reference database is finally clustered with Sumaclust [7, 4] using semi-global alignment and a 95% identity threshold All those steps can be performed using the provided script: matam_db_preprocessing.py from the GitHub repository (https://github.com/bonsai-team/matam)
منابع مشابه
MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes
Motivation Advances in the sequencing of uncultured environmental samples, dubbed metagenomics, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length ma...
متن کاملFragGeneScan: predicting genes in short and error-prone reads
The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not a...
متن کاملGenovo: De Novo Assembly for Metagenomes
Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic mode...
متن کاملTranscriptome analysis of the freshwater pearl mussel, Hyriopsis cumingii (Lea) using illumina paired-end sequencing to identify genes and markers
The transcriptome of triangle sail mussel Hyriopsis cumingii (Lea) using Illumina paired-end sequencing technology was conducted and analyzed. Equal quantities of total RNA isolated from six tissues, including gonad, hepatopancreas, foot, mantel, gill and adductor muscle, were pooled to construct a cDNA library. A total of 58.09 million clean reads with 98.48 % Q20 bases were generated. Cluster...
متن کاملAccurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017